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BACKGROUND 

This work concerns a technique to be used in the solution of 
optimal trajectory problems associated with kinetic energy 
weapons. In this problem, it is desired to solve for a control 
function (which might be thrust magnitude and direction of a 
gimbaled engine) in time in order to minimize time to intercept 
an enemy missile. 

Such problems are really infinite dimensional in nature 
(i.e., determining the control at each time point along the 
trajectory). However, in using a digital computer to solve such 
problems, certain operations occur which make the problem 
discrete and so viewable in a finite dimensional setting. For 
example, to numerically integrate the differential equations of 
motion, only values of thrust at a finite number of time points 
(typically, the beginning of each integration interval) affect 
the trajectory. 

The problem then is to determine these values so as to 
minimize the time to intercept. For any particular trajectory, 
this quantity is computed through a complicated flight equation 
simulation model. Also inherent in this computation is noise 
so that the computed time to intercept is really a noisy 
quantity. The current algorithm considers the noise in solving 
for the optimal control. 



INTRODUCTION 

We are concerned here with the problem of minimizing 
functions in which the only data available are function values. 
These values could be obtained by: a) observation and, 

therefore, be subject to measurement errors, or byb) simulation 
and so, be subject to computational errors of the simulation 
model. A difficulty present in any type of numerical 
optimization and often, in particular, when these errors exist, 
is "stalling". This is the inability of the algorithm to further 
reduce the function from its value at a non-minimizing point. 
The object of the current work is to postpone "stalling" as long 
as possible. 
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"Stalling" arises when the magnitudes of the true gradient 
and of the error in determining the gradient become comparable. 
In the current problem, graaients are estimated from noisy 
function values which can be expected to include an absolute 
noise component. As the minimizing point is approached, the 
magnitude of the true gradient decreases but the absolute noise 
present in estimating it does not so that "stalling" results. 

The procedure employed to delay "stalling" uses data 
smoothing in computations involving descent direction 
determination. The criterion of least squares is used to fit the 
data over a mesh of points. A secondary optimization problem is 
solved to optimize the location of the mesh points so that 
maximal descent direction determination accuracy will be 
achieved. 

This latter optimization is very demanding in the number of 
data points and, hence, also the number of function evaluations 
required. For this reason, the determination of the mesh is done 
by three methods henceforth referred to as A, B and C. These are 
of increasing sophistication and expense and are usea 
progressively as needed to provide continued decrease of the 
function f. Method B witn associated numerical results is 
discussed in this paper. Method A appears in [1] and method C is 
the subject of a future paper. 



DESCRIPTION OF THE ALGORITHM 

The basic scheme is described in [2]. The data is assumed 
to include an absolute noise that is bounded by the input 
parameter epsilon. It is possible to also include relative 
noise, however, for small function values in the "stalling" phase 
(these are the type of cases that were run) absolute errors 
dominate relative erros. Thus, absolute errors were concentrated 
on in this paper. 

The procedure of [2] consists of computing relaxed Newton 
and gradient steps. These directions are determined by using 
least squares to fit a quadratic smoothing polynomial over a 
local mesh of points surrounding the current estimate of the 
minimizing point. This fit yields first and second order 
coefficients for the function that define the Newton and gradient 
directions . 

Method A is the first method used to determine the mesh. It 
is based on the fact that in the least squares process the 
accuracy in estimating the coefficients of the smoothing 
polynomial is directly related to the error in function value 
differences across the mesh. The method proceeds as follows: 
the spacing along each coordinate axis is selected with the 
purpose of maintaining at least a minimum significance in the 
above referred to difference of function values. A linear model 
is assumed for the function and the spacing predicted on the 
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basis of it. In order to keep function evaluations to a minimum 
at the current estimate of minimizing point, the success in 
achieving the above purpose is not verified. At the next 
estimate of minimizing point, the spacing is updated, based on 
the previous estimate. This scheme has proven to be adequate 
from far-off starting points where, due to a large gradient 
magnitude, the determination of the mesh is not so critical. 

If, while performing mesh determination by method A, "stalling" 
occurs, the algorithm then switches to method B. This method has 
the same stated purpose as method A, but does not use a linear 
moael. Also, the predicted value of spacing is verified by 
function evaluation and solution is accomplished through an 
iterative technique. Furthermore, in order to minimize 
truncation errors, the smallest feasible spacing is used. 

In the event that "stalling" occurs while method B is being 
used for mesh determination, then the algorithm switches to 
method C. This method requires extensive computation. Hence, it 
is suited for use only in the final stages as the solution is 
approached. The goal now is to determine the mesh which 
minimizes the fitting error in the first and second order 
coefficients. This is done by adjusting the location of each 
mesh point so as to minimize the above error. 

There are two points to note here. First, the optimal 
adjustment of each coordinate of each mesh point would normally 
lead to a large dimensional problem. Second, since the true 
coefficients are not known, then reduction of the fitting error 
cannot be done directly. Both of these difficulties are 
discussed and procedures developed to overcome them in a future 
paper . 



MESH DETERMINATION BY METHOD B 

Optimal spacing is different for first and second order 
coefficients. Because of this, a separate mesh and fit is 
computed for each of these. Following is a description of the 
construction of the mesh used to estimate the first order 
coefficients. 

In addition to the absolute noise (discussed above) in 
function values, roundoff noise is present in representing these 
values in the computer. This is approximately equal to r|f(x)| 
where r is the roundoff in representing unity in the computer. 
The criterion used for mesh spacing in this paper is the 
following folk dictum: across the mesh, first order function 

differences (used in estimating first order coefficients) should 
retain at least approximately one-half the number of significant 
digits present in f(X) itself. In order to satisfy this, tne 
spacing for the jth coordinate axis (j=1,...,n) of the mesh is 
computed by iteration such that, with the notation listed next, 
then the formula below is true. The symbols 

f, Xg,hj , I j are respectively the function to be 
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minimizec, the currest estimate of minimizing point, tne spacing 
along the jth axis and the jth column of the identity matrix 



|f(XQ + h^I^) - f(XQ)| = \/2 (c + r I f (Xq) I ) 



The above equation is solved by a method of bisection, thus 
determining the mesh used for least squares. Only the first 
order coefficients of this fit are used in the algorithm since 
these are the only ones for which the mesh was adjusted. 

A scheme similar to the one above is used to determine the 
mesh for the second order coefficients. The distinction is in 
retaining the significance of second order differences. 
Analogous to the above, and using the notation there, the spacing 
is adjusted to satisfy 

|f(XQ + hjlj) + f(XQ - h^I.) - 2f(XQ)| :: \/ 4 (e + r| f (X^) I ) 



A new least squares determination is made with this mesh and 
the resulting hessian is used to compute a Newton direction. 
Next, line searches (described below) along both the Newton and 
gradient directions are made and the smallest function value used 
to determine the next estimate of the minimizing point. 



THE LINE SEARCH 

The first and second order coefficients resulting from the 

above fits are used to define the negative gradient direction 

D_ and the Newton direction D-. These vectors define the 
g n 

directions of separate one dimensional searches (described next) 
to obtain a value of f lower than f(XQ). That vector associated 

with the lowest attained value of f defines the direction and 
step for the next value of Xq about which to form meshes and 

restart the process. 

The one dimensional search for each vector consists of 
starting with an initial step size for Newton and gradient 
directions respectively sn = 1, sg = 2 / H(Xq) , (these stepsizes 

give guaranteed decreases in f along Newton and gradient 
direction) where H is the fitted Hessian of the function f. 
These values are halved repeatedly if f increases from f(Xg). 

Tne halving process continues until an input value lower bound 
exceeds the current step size in which case no decrease is 
assumed along that vector. If f is decreased from f(XQ) then 



repeated steps of that size are taken until f starts to increase. 
As soon as a local dip occurs in the value of f, then an 
o V er d e t er m i n ed quadratic is fit through the dip, the critical 
point determined and compared to the other sampled values of f 
along that vector and the lowest value recorded. 



NUMERICAL RESULTS 

Method B was tested in stand-alone mode, using test problems 
with standard starting values near the solution. Method A is 
counted on to provide the movement from far-off starts to points 
near the solution where method B will take over. For comparison 
purposes, the IMSL routine ZXMIN (a quasi-Newton method) and tne 
Nelder-Mead Simplex method were used. 

Since function values for the runs were computed (rather 
than measured) then the roundoff error of r f(X) outlined above 
should be replaced by the actual roundoff error in computing 
f(X). This was done as follows: The runs were made on a 
computer with roughly seven decimal places of accuracy (single 
precision). In order to approximate the roundoff in computing f 

at location X, a vector of |X|e”^ was added to X (as a bound on 
the single precision error in representing X) and this sum which 
shall be called X^^, was represented in double precision. Next, 

a double precision computation of f at X^ was performed. Calling 

this as fj(X^), then the difference | f^j(X^j)-f (X)| (where both 

function values were computed without input noise) was used as 
the roundoff error in computing f(X). 

Four standard problems with standard starting values were 
run for each of the algorithms. These are the Rosen broc k ,Bea 1 e 
and Freudenstein-Roth problems in two dimensions and the Helical 
Valley problem in three dimensions. For each problem the input 
noise bound epsilon was run at the levels of 0, .001 and .01. 
The input noise consisted of multiplying these values by the 
output of a uniform random number generator with mean 0 and 
variance 1. Runs were terminated when stalling occurred. The 
results are listed below with max | (AX^)!, |Af| being 

i 

respectively the maximum absolute component miss in achieving the 
minimizing point and the associated absolute miss in function 
value 
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PROBLEM & 
STARTING 
POINT 


INPUT 

NOISE 


METHOD B 
mx lAf| 


ZXMIN 
mx 1 Af 1 


SIMPLEX 
mx 1 Af 1 


ROSENBROCk 


0.000 


0.0 


0.0 


0.4E-3 


0.3E-7 


0.0 


0.0 


-1.2, U 


0.001 


0.104 


0.4E-2 


0.19 


1.4 


0.7E-2 


0.6E-3 




0.010 


0.330 


0.37E-1 


1.65 


0.76E-1 


0.368 


0.35E-1 


F-ROTH 


0.000 


0.0 


0.0 


O.lE-3 


0.1 E-3 


0.2E-3 


0.0 


0.5, -2. 


0.001 


0.0 


0.5E-3 


O.lE-1 


0.8E-3 


0.7E-4 


O.lE-3 




0.010 


0.16E-2 


0.21E-2 


O.lE-1 


0.8E-3 


O.lE-1 


0.33E-2 


HELICAL 


0.000 


0.2E-9 


0.2E-17 


0.2E-6 


0.6E-14 


0.6E-8 


0.5E-16 


VALLEY 


0.001 


0.3E-1 


0.8E-3 


0.479 


0.229 


0.15 


0.23E-1 


-1.0, 0.0, 0.0 


0.010 


0.11 


0.55E-2 


2.9 


8.48 


0.189 


0.26E-1 


BEALE 


0.000 


0.000 


l.E-13 


overflow 


0.00 


0.000 


10., 10. 


0.001 


0.39E-1 


0.7E-3 


overflow 


7.19 


0.297 




0.010 


0.119 


O.llE-1 


overflow 


7.26 


0.302 



CONCLUSIONS 

Generally, the best performance was exhibited either by 
method B or the simplex method. In particular, for the lowest 
values of input noise, these methods either shared or alternated 
in attaining the best results. However, for the highest input 
noise level, method B showed definite dominance over the simplex 
method . 
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